Search CORE

301 research outputs found

CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity

Author: Agnes Frederic
Besacier Laurent
Ferrero Jeremy
Schwab Didier
Publication venue
Publication date: 01/01/2017
Field of study

We present our submitted systems for Semantic Textual Similarity (STS) Track 4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must estimate their semantic similarity by a score between 0 and 5. In our submission, we use syntax-based, dictionary-based, context-based, and MT-based methods. We also combine these methods in unsupervised and supervised way. Our best run ranked 1st on track 4a with a correlation of 83.02% with human annotations

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

Deep Investigation of Cross-Language Plagiarism Detection Methods

Author: Agnes Frederic
Besacier Laurent
Ferrero Jeremy
Schwab Didier
Publication venue
Publication date: 24/05/2017
Field of study

This paper is a deep investigation of cross-language plagiarism detection methods on a new recently introduced open dataset, which contains parallel and comparable collections of documents with multiple characteristics (different genres, languages and sizes of texts). We investigate cross-language plagiarism detection methods for 6 language pairs on 2 granularities of text units in order to draw robust conclusions on the best methods while deeply analyzing correlations across document styles and languages.Comment: Accepted to BUCC (10th Workshop on Building and Using Comparable Corpora) colocated with ACL 201

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

ANT COLONY ALGORITHM APPLIED TO AUTOMATIC SPEECH RECOGNITION GRAPH DECODING

Author: Lecouteux Benjamin
Schwab Didier
Publication venue: HAL CCSD
Publication date: 01/01/2015
Field of study

International audienceIn this article we propose an original approach that allows the decoding of Automatic Speech Recognition Graphs by using a constructive algorithm based on ant colonies. In classical approaches, when a graph is decoded with higher order language models; the algorithm must expand the graph in order to develop each new observed n-gram. This extension process increases the computation time and memory consumption. We propose to use an ant colony algorithm in order to explore ASR graphs with a new language model, without the necessity of expanding it. We first present results based on the TED English corpus where 2-grams graph are decoded with a 4-grams language model. Then, we show that our approach performs better than a conventional Viterbi algorithm when computing time is constrained and allows a highly threaded decoding process with a single graph and a strict control of computation time and memory consumption

Hal - Université Grenoble Alpes

Modelling, Detection And Exploitation Of Lexical Functions For Analysis.

Author: Lafourcade Mathieu
Schwab Didier
Publication venue
Publication date: 01/11/2006
Field of study

Lexical functions (LF) model relations between terms in the lexicon. These relations can be knowledge about the world (Napoleon was an emperor) or knowledge about the language (‘destiny’ is synonym of ‘fate’)

CiteSeerX

Repository@USM

Lexical Functions For Ants Based Semantic Analysis.

Author: Lafourcade Mathieu
Schwab Didier
Publication venue
Publication date: 01/06/2007
Field of study

Semantic analysis (SA) is a central operation in natural language processing. We can consider it as the resolution of 5 problems: lexical ambiguity, references, prepositional attachments, interpretative paths and lexical functions instanciation

Repository@USM

Ant Colony Algorithm Applied to Automatic speech Recognition Graph Decoding

Author: Lecouteux Benjamin
Schwab Didier
Publication venue: HAL CCSD
Publication date: 01/01/2015
Field of study

International audienc

Extension lexicale de définitions grâce à des corpus annotés en sens

Author: Schwab Didier
Tchechmedjiev Andon
Vial Loïc
Publication venue: HAL CCSD
Publication date: 04/07/2016
Field of study

International audienceLexical Expansion of definitions based on sense-annotated corpus For many natural language processing tasks and applications, it is necessary to determine the semantic relatedness between senses, words or text segments. In this article, we focus on a knowledge-based measure, the Lesk measure, which is certainly among the most commonly used. The similarity between two senses is computed as the number of overlapping words in the definitions of the senses from a dictionary. In this article, we study the expansion of definitions through the use of sense-annotated corpora. The idea is to take into account words that are most frequently used around a particular sense and to use the top of the frequency distribution to extend the corresponding definition. We show better performances on a Word Sense Disambiguation task surpassing state-of-the-artPour un certain nombre de tâches ou d'applications du TALN, il est nécessaire de déterminer la proximité sémantique entre des sens, des mots ou des segments textuels. Dans cet article, nous nous intéressons à une mesure basée sur des savoirs, la mesure de Lesk. La proximité sémantique de deux définitions est évaluée en comptant le nombre de mots communs dans les définitions correspondantes dans un dictionnaire. Dans cet article, nous étudions plus particulièrement l'extension de définitions grâce à des corpus annotés en sens. Il s'agit de prendre en compte les mots qui sont utilisés dans le voisinage d'un certain sens et d'étendre lexicalement la définition correspondante. Nous montrons une amélioration certaine des performances obtenues en désambiguïsation lexicale qui dépassent l'état de l'art

Hal - Université Grenoble Alpes

Sense Embeddings in Knowledge-Based Word Sense Disambiguation

Author: Lecouteux Benjamin
Schwab Didier
Vial Loïc
Publication venue: HAL CCSD
Publication date: 01/01/2017
Field of study

International audienc